Efficient Mllr

نویسنده

  • Matthew Gibson
چکیده

The need for close to real time speech recognition has recently driven interest in fast LVCSR systems. Due to the time constraint, such systems often discard, where possible, sub-processes of the entire recognition process which demand relatively large amounts of computation and yield relatively small accuracy gains. This report focusses on such speed-accuracy tradeoffs with regard to speaker adaptation. A variety of techniques are used to reduce the compute time of a baseline adaptation system which uses mean-only maximum likelihood linear regression (MLLR). Use of a mixture component-level Viterbi alignment to accumulate adaptation statistics and least squares linear regression transform estimation are compared to the baseline techniques in terms of speed and accuracy. In the case of unsupervised adaptation, exploitation of word boundary information generated in an initial recognition pass is shown to further reduce adaptation time with no negative impact upon system accuracy. Again, in the unsupervised case, an illustration of how confidence scores can be used to simultaneously reduce adaptation time and improve accuracy is presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker adaptive training using shift-MLLR

In this paper a novel method for speaker adaptive training (SAT), based on Gaussian mean offset adaptation, so called Shift-MLLR, is presented. The method differs from previous SAT methods, where linear transformations of Gaussian means or features are utilized, in that only an offset vector is used for adaptation, but instead the number of regression classes is increased. This is shown to allo...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Integration of MLLR adaptation with pronunciation proficiency adaptation for non-native speech recognition

To recognize non-native speech, larger acoustic/linguistic distortions must be handled adequately in acoustic modeling, language modeling, lexical modeling, and/or decoding strategy. In this paper, a novel method to enhance MLLR adaptation of acoustic models for non-native speech recognition is proposed. In the case of native speech recognition, MLLR speaker adaptation was successfully introduc...

متن کامل

Implementing Vocal Tract Le in the Mllr Fram

Vocal Tract Length Normalization (VTLN) and Maximum Likelihood Linear Regression (MLLR) are two approaches to reduce the degradation in speech recognition performance caused by variation of speakers. This paper derives a novel efficient adaptation algorithm from the two techniques. Based on prior knowledge of usual VTLN, an approximate constrained-form linear transformation is obtained. The tra...

متن کامل

Computationally Efficient Speaker Identification for Large Population Tasks using MLLR and Sufficient Statistics

In conventional Speaker-Identification using GMM-UBM framework, the likelihood of the given test utterance is computed with respect to all speaker-models before identifying the speaker, based on the maximum likelihood criterion. The calculation of likelihood score of the test utterance is computationally intensive, especially when there are tens of thousands of speakers in database. In this pap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004